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Abstract 

Hyperspectral images (HIS) classification is a high technical remote 
sensing tool. The goal is to reproduce a thematic map that will be 
compared with a reference ground truth map (GT), constructed by ex- 
pecting the region. The HIS contains more than a hundred bidirectional 
measures, called bands (or simply images), of the same region. They 
are taken at juxtaposed frequencies. Unfortunately, some bands contain 
redundant information, others are affected by the noise, and the high di- 
mensionality of features made the accuracy of classification lower. The 
problematic is how to find the good bands to classify the pixels of re- 
gions. Some methods use Mutual Information (MI) and threshold, to 
select relevant bands, without treatment of redundancy. Others control 
and eliminate redundancy by selecting the band top ranking the MI, 
and if its neighbors have sensibly the same MI with the GT, they will 
be considered redundant and so discarded. This is the most inconve- 
nient of this method, because this avoids the advantage of hyperspectral 
images: some precious information can be discarded. In this paper well 
accept the useful redundancy. A band contains useful redundancy if it 
contributes to produce an estimated reference map that has higher MI 
with the GT. To control redundancy, we introduce a complementary 
threshold added to last value of MI. This process is a Filter strategy; it 
gets a better performance of classification accuracy and not expensive, 
but less preferment than Wrapper strategy. 

Keywords: Hyperspectral images, classification, feature selection, mutual 
information, redundancy 
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1 Introduction 

In the feature classification domain, the choice of data affects widely the re- 
sults. For the Hyperspectral image, the bands dont all contain the information; 
some bands are irrelevant like those affected by various atmospheric effects, see 
Figure. 4, and decrease the classification accuracy. And there exist redundant 
bands to complicate the learning system and product incorrect prediction [14] . 
Even the bands contain enough information about the scene they may cant 
predict the classes correctly if the dimension of space images, see Figure. 3, is 
so large that needs many cases to detect the relationship between the bands 
and the scene (Hughes phenomenon) [10]. We can reduce the dimensionality 
of hyperspectral images by selecting only the relevant bands (feature selection 
or subset selection methodology), or extracting, from the original bands, new 
bands containing the maximal information about the classes, using any func- 
tions, logical or numerical (feature extraction methodology) [11] [9]. Here we 
focus on the feature selection using mutual information. Hyperspectral images 
have three advantages regarding the multispectral images [6], see Figure 1. 
First: the hyperspectral image contains more than a hundred images but the 
multispectral contains three at ten images. 

Second: hyperspectral image has a spectral resolution (the central wavelength 
divided by de width of spectral band) about a hundred, but that of multispec- 
tral is about ten. 

Third: the bands of a hyperspectral image is regularly spaced, those of mul- 
tispectral image is large and irregularly spaced. 

Assertion: when we reduce hyperspectral images dimensionality, any method 
used must save the precision and high discrimination of substances given by 
hyperspectral image. 
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Figure 1: Precision an dicrimination added by hyperspectral images 



In this paper we use the Hyperspectral image AVIRIS 92AV3C (Airborne 
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Visible Infrared Imaging Spectrometer). [2]. It contains 220 images taken 
on the region "Indiana Pine" at "north-western Indiana", USA [1]. The 220 
called bands are taken between 0.4m and 2.5m. Each band has 145 lines and 
145 columns. The ground truth map is also provided, but only 10366 pixels 
are labeled fro 1 to 16. Each label indicates one from 16 classes. The zeros 
indicate pixels how are not classified yet, Figure. 2. 

The hyperspectral image AVIRIS 92AV3C contains numbers between 955 and 
9406. Each pixel of the ground truth map has a set of 220 numbers (mea- 
sures) along the hyperspectral image. This numbers (measures) represent the 
reflectance of the pixel in each band. So the pixel is shown as vector off 220 
components. Figure .3. 
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Figure 2: The Ground Truth map of AVIRIS 92AV3C and the 16 classes 

The hyperspectral image AVIRIS 92AV3C contains numbers between 955 
and 9406. Each pixel of the ground truth map has a set of 220 numbers 
(measures) along the hyperspectral image. This numbers (measures) represent 
the reflectance of the pixel in each band. So the pixel is shown as vector off 
220 components. 

Figure. 3 shows the vector pixels notion [7]. So reducing dimensionality means 
selecting only the dimensions caring a lot of information regarding the classes. 
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Figure 3: The notion of pixel vector 
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We can also note that not all classes are carrier of information. In Figure. 
4, for example, we can show the effects of atmospheric affects on bands: 155, 
220 and other bands. This hyperspectral image presents the problematic of 
dimensionality reduction. 



2 Mutual Information based feature selection 
2.1 Definition of mutual information 

This is a measure of exchanged information between tow ensembles of random 
variables A and B : 

I(A,B)=J2 log 2 p(A,B) l,[A ' B) 



p(A).p(B) 

Considering the ground truth map, and bands as ensembles of random vari- 
ables, we calculate their interdependence. 

Fano [14] has demonstrated that as soon as mutual information of already 
selected feature has high value, the error probability of classification is de- 
creasing, according to the formula bellow: 

H{C IX) - 1 < ^ < Hip IX) 



with : 



and : 



Log 2 (N c ) ~ Log 2 

H{C/X) - 1 _ H(C) - J(C; X) - 1 
Log 2 (N c ) Log 2 (N c ) 

H(C)-I(C;X) H{C/X) 



P P < 



Log 2 Log 2 



The expression of conditional entropy H(C/X) is calculated between the 
ground truth map (i.e. the classes C) and the subset of bands candidates X. 
Nc is the number of classes. So when the features X have a higher value of 
mutual information with the ground truth map, (is more near to the ground 
truth map), the error probability will be lower. But its difficult to compute 
this conjoint mutual information I(C;X), regarding the high dimensionality 
[14]- 

Geo [3] uses also the average of bands 170 to 210, to product an estimated 
ground truth map, and use it instead of the real truth map. Their curves are 
similar. This is shown at Figure 4. 
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Figure 4: Mutual information of AVIRIS with the Ground Truth map (solid 
line) and with the ground apporoximated by averaging bands 170 to 210 
(dashed line) . 

3 The principe of proposed algorithm 

3.1 Case of synthetic bands 

• Band A contains only the class number 11 (Soybeans-min) 

• Band B contains only the class number 14 (Woods). 

• Band C contains the bands number 11 and 14. 

Now we calculate the mutual information between each of them and the 
GT. We compute also the MI between the GT and the superposition of C and 
B. The results shown at Table. 1 



Table 1: MI of GT with synthetic bands and the accuracy of classification in 
each case 



Bands 
















Synthetics 


A 


B 


C 


A,C 


B,C 


A,B 


A,B,C 


MI 


0.52 


0.33 


0.84 


0.84 


0.84 


0.84 


0.84 


Accuracy(%) 


24.6 


12.6 


36.6 


36.6 


36.6 


36.6 


36.6 
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Figure 5: Three synthetic bands choised for illustrate the principe of the algo- 
rithme. 

3.2 Comments 

Tabled allows us to comment too cases: 

• Casel: The band A and B are superposed to produce band C: The MI of 
the estimated reference map C and GT, increases. We have information 
added. 

• Case2: The band A and C are superposed to produce an estimated refer- 
ence map. We compute its MI with GT. The MI value isnt change: the 
band A added redundant information. Its the same when we superpose 
B and C. Its also the same when we superpose more redundant bands: 
A,B and C : no information added.. 

3.3 Partial conclusion 

First: There is an important observation: the superposition of bands A and 
B to construct C, can be interpreted as construction of an estimated reference 
map bay averaging the latest one and the band candidate to be se- 
lected. 

Second: We can emit this roll: a band is relevant to classification if it 
contributes to produce an estimated reference map, that has the mutual infor- 
mation with the ground truth map increasing, if else it mast be discarded 

4 Proposed algorithm 

Our idea is based on this observation: the band that has higher value of Mu- 
tual Information with the ground truth map can be a good approximation of 
it. So we note that the subset of selected bands are the good ones, if thy can 
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generate an estimated reference map, sensibly equal the ground truth map. 
We generate the estimated reference map by averaging the latest one and the 
band candidate. So its a Filter approach [16] [13]. 

Our process of band selection will be equivalent to following steps: we order 
the bands according to value of its mutual information with the ground truth 
map. Then we initialize the selected bands ensemble with the band that has 
highest value of MI. At a point of process, we build an approximated refer- 
ence map (Lest by averaging the latest one and the band candidate, and we 
compute the MI(C_est,GT). The latest band added (at those already selected) 
must make MI(C_est,GT) increasing, if else it will be discarded from the en- 
semble retained. Then we introduce a complementary threshold Th to control 
redundancy. So the band to be selected must make MI increasing by a step 
equal to Th. The algorithm following shows more detail of the process: 



Algorithm 1 : Let SS be the ensemble of bands already selected and S the 
band candidates to be selected. SS is initially empty; R the ensemble of bands 
condidate, it contains initially all bands (1..220). MI is initialized with a value 
MI*, X the number of bands to be selected an Th the threshold controlling 
redundancy: 

1) Select the first band to initialize (Lest: 
Select band index s S=argmax s MI(s); 
SS 4r- S; 

R<- R\S; 
C-estO = Band(S); 

2) Selection process: 
while \SS\ < X do 

Select band index s S=argmax s MI(s) and R ^— R \ S; 

C_es£0 + Band(S) 
C ^est = ; //C-est = Build e stimated-C 

Ml = MutuaLInformation(GT, C .est) 
if MI > MI* + Threshold then 

MI* = MI; 

C-estO = C-est; 

SS <- SSUS; 
end if 
end while 
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5 Results and Discussion 

We apply this algorithm on the hyperspectral image AVIRIS 92AV3C [1], 50the 
labeled pixels are randomly chosen and used in training; and the other 50used 
for testing classification [3]. The classifier used is the SVM [5] [12] [4]. 
We had to choice negatives values of Th. It means that is impossible to in- 
crease accuracy of classification without allowing redundancy. 



5.1 Results 

Table. 2 shows the results obtained for several thresholds. We can see the effec- 
tiveness selection bands of our algorithm, and the important effect of avoiding 
redundancy. 

Table 2: Results illustrate elimination of Redundancy using algorithm pro- 
posed, for thresholds (Th) 



Bands 
retained 


-0.02 


-0.01 


-0.005 


Th 

-0.004 


-0.0035 


2 


47.44 


47.44 


47.44 


47.44 


47.44 47.44 


3 


47.87 


47.87 


47.87 


47.87 


47.87 48.92 


4 


49.31 


49.31 


49.31 


49.31 


49.31 


12 


56.30 


56.30 


56.30 


56.30 


60.76 


14 


57.00 


57.00 


57.00 


57.00 


61.80 


18 


59.09 


59.09 


59.09 


59.09 


63.00 


20 


63.08 


63.08 


63.08 


63.53 




25 


66.12 


64.89 


64.89 


65.38 




30 


73.54 


70.72 


70.72 


67.68 




33 


73.72 


74.79 


75.65 






35 


76.06 


74.72 


75.59 






36 


76.49 


76.60 


76.19 






40 


78.96 


79.29 








45 


80.85 


81.01 








50 


81.63 


81.12 








53 


82.27 


86.03 








60 


82.74 


85.08 








70 


86.95 










75 


86.81 










80 


87.28 










83 


88.14 
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5.2 Analysis and Discussion 

Important: When we apply our algorithm on the real data, here AVIRIS 
92AV3C, we note that the cant increase the accuracy without allowing redun- 
dancy bay negatives thresholds. But the idea is good: we note that the algorithm 
is selective and the threshold control effectiveness the redundancy: 
First: For the relatively highest threshold values (-0.0035,-0,001,0,+) we ob- 
tain a hard selection: a few more informative bands are selected. 
Second: For the medium threshold values (-0.01, -0.005, -0.004), some redun- 
dancy is allowed, even if its harmful (negative values of thresholds), in order 
to made increasing the classification accuracy. 

Tired: As soon as the threshold value is more negative (-0.02), the redun- 
dancy allowed becomes useless, we have the same accuracy with more bands. 
Finally: for the more negative thresholds (for example -4), we allow all bands 
to be selected, and we have no action of the algorithm. This corresponds at 
selection bay ordering bands on mutual information for numerous thresholds. 
The performance is low. 

We can not here that Hui Wang [15] uses two axioms to characterize feature 
selection. Sufficiency axiom: the subset selected feature must be able to re- 
produce the training simples without losing information. The necessity axiom 
"simplest among different alternatives is preferred for prediction". In the al- 
gorithm proposed, reducing error probability between the truth map and the 
estimated minimize the information loosed for the samples training and also 
the predicate ones. 

We not also that we can use the number of features selected like condition to 
stop the search. [16]. 

Partial conclusion: The algorithm proposed effectively reduces dimen- 
sionality of hyperspectral images. 

6 Conclusion 

In this paper we presented the necessity to reduce the number of bands, in 
classification of Hyperspectral images. Then we carried out the effectiveness 
of mutual information to select bands able to classify the pixels of ground 
truth. And also we insisted on saving the propriety of hyperspectral images 
regarding the multispectral images, when we reduce dimensionality. We intro- 
duce an algorithm based on mutual information. To choice a band, it must 
contribute to reproduce an estimated ground truth map more closed to the 
reference map. A complementary threshold is added to avoid redundancy. So 
each band retained has to reproduce an estimated ground truth map more 
closed to the reference map by a step equal to threshold even if it caries a 
redundant information. But the method used her to estimate the reference 
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map, play an important role: here with averaging bands; we are constraint to 
use negative value of threshold; so we allow more redundancy. We can tell that 
we conserve the useful redundancy by adjusting the complementary threshold. 
This algorithm is a feature selection methodology, and its a Filter approach. Its 
less expensive. It can be implemented in real time applications. This scheme 
is very interesting to investigate and improve, considering its performance. 
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